Back

Genome Research

Cold Spring Harbor Laboratory

Preprints posted in the last 7 days, ranked by how well they match Genome Research's content profile, based on 409 papers previously published here. The average preprint has a 0.15% match score for this journal, so anything above that is already an above-average fit.

1
GRASP: Gene-relation adaptive soft prompt for scalable and generalizable gene network inference with large language models

Feng, Y.; Deng, K.; Guan, Y.

2026-04-14 bioinformatics 10.1101/2025.10.20.683485 medRxiv
Top 0.7%
4.3%
Show abstract

Gene networks (GNs) encode diverse molecular relationships and are central to interpreting cellular function and disease. The heterogeneity of interaction types has led to computational methods specialized for particular network contexts. Large language models (LLMs) offer a unified, language-based formulation of GN inference by leveraging biological knowledge from large-scale text corpora, yet their effectiveness remains sensitive to prompt design. Here, we introduce Gene-Relation Adaptive Soft Prompt (GRASP), a parameter-efficient and trainable framework that conditions inference on each gene pair through only three virtual tokens. Using factorized gene-specific and relation-aware components, GRASP learns to map each pair's biological context into compact soft prompts that combine pair-specific signals with shared interaction patterns. Across diverse GN inference tasks, GRASP consistently outperforms alternative prompting strategies. It also shows a stronger ability to recover unannotated interactions from synthetic negative sets, suggesting its capacity to identify biologically meaningful relationships beyond existing databases. Together, these results establish GRASP as a scalable and generalizable prompting framework for LLM-based GN inference.

2
Deriving LD-adjusted GWAS summary statistics through linkage disequilibrium deconvolution

Nouira, A.; Favre Moiron, M.; Tournaire, M.; Verbanck, M.

2026-04-11 genetic and genomic medicine 10.64898/2026.04.10.26350574 medRxiv
Top 1.0%
3.6%
Show abstract

Genome-wide association studies (GWAS) have identified numerous genetic variants associated with complex traits. However, linkage disequilibrium (LD) confounds these associations, leading to false positives where non-causal variants appear associated because they are correlated with nearby causal variants. This is particularly the case in highly polygenic traits where the genome can be saturated in causal variants. To address this issue, we propose LDeconv a method based on truncated singular value decomposition (SVD) that adjust GWAS summary statistics without requiring individual-level genotype data. This approach accounts for LD structure, isolates causal variants in high-LD regions, and improve the reliability of effect size estimates. We assess its performance through simulations across various LD scenarios, conduct extensive sensitivity analyses, and apply them to real GWAS data from the UK Biobank. Our results demonstrate that LDeconv effectively reduces false discoveries while preserving true associations, offering a robust framework for post-GWAS analysis.

3
Human Oncogene EWS::FLI1 Functions as a Pioneer Factor in Saccharomyces cerevisiae.

Velazquez, D.; Molnar, C.; Reina, J.; Mora, J.; Gonzalez, C.

2026-04-14 cancer biology 10.1101/2025.10.22.680884 medRxiv
Top 4%
0.7%
Show abstract

Ewing sarcoma (EwS) is an aggressive, human-exclusive tumor typically driven by the EWS::FLI1 fusion protein. To assess whether the neomorphic functions of EWS::FLI1 are fundamentally dependent on evolutionarily recent cofactors such as ETS transcription factors (ETS-TFs), Plycomb group (PcG) proteins, CBP/p300, or specific subunits of the BAF complex, we expressed EWS::FLI1 in the model organism Saccharomyces cerevisiae. This minimal system was chosen because several key EWS::FLI 's cofactors possess greatly reduced sequence homology (e.g., BAF) or are lacking altogether (e.g., ETS-TFs, PcG, or CBP/p300). We used co-IP/MS to map the yeast interactome, Chip-Seq to identify gDNA binding sequences, RNA-Seq for global gene expression, and engineered reporters to test conversion of (GGAA) tandem repeats (GGAASat) into neoenhancers. We found that the yeast EWS::FLI1 interactome was more limited and qualitatively distinct from its human counterpart, sharing core machinery (e.g. RNA Polymerase II, FACT) but lacking the BAF/SWI-SNF and spliceosome complexes, and showing strong enrichment for the SAGA chromatin remodeling complex. We also found that EWS::FLI1 binds to hundreds of sites in the yeast genome with a clear preference for putative ETS-TF consensus sequences and (CA) dinucleotide repeats. Yet, EWS::FLI1 expressing cells presented only minimal transcriptional dysregulation, a stark contrast to the extensive changes observed in humans and Drosophila cells. Finally, we found that EWS::FLI1 successfully converted silent GGAASat sequences into active enhancers in yeast. This remarkable result occurs despite the absence of homologs for key human activators, such as CBP/p300, strongly suggesting that EWS::FLI1 can mobilize functionally related, non-homologous pathways to establish neoenhancers at GGAASat sites. Altogether, our results indicate that EWS::FLI1's core ability to drive GGAASat-dependent gene expression is a conserved, ancient property, while GGAASat-independent extensive transcriptome reprogramming is dependent on co-factors and pathways specific to animal cells.

4
Identification, evolutionary history and characteristics of orphan genes in root-knot nematodes

Seckin, E.; Colinet, D.; Bailly-Bechet, M.; Seassau, A.; Bottini, S.; Sarti, E.; Danchin, E. G.

2026-04-11 bioinformatics 10.64898/2025.12.19.695360 medRxiv
Top 4%
0.7%
Show abstract

Orphan genes, lacking homologs in other species, are systematically found across genomes. Their presence may result from extensive divergence from pre-existing genes or from de novo gene birth, which occurs when a gene emerges from a previously non-genic region. In this study, we identified orphan genes in the genomes of globally distributed plant-parasitic nematodes of the genus Meloidogyne and investigated their origins, evolution, and characteristics. Using a comparative genomics framework across 85 nematode species, we found that 18% of Meloidogyne genes are genus-specific, transcriptionally supported orphans. By combining ancestral sequence reconstruction and synteny-based approaches, we inferred that 20% of these orphan genes originated through high divergence, while 18% likely emerged de novo. Proteomic and translatomic evidence confirmed the translation of a subset of these genes, and feature analyses revealed distinctive molecular signatures, including shorter length, signal peptide enrichment, and a tendency for extracellular localization. These findings highlight orphan genes as a substantial and previously underexplored component of the Meloidogyne genome, with potential roles in their worldwide parasitism.

5
Vector2Variant: Discovery of Genetic Associations from ML Derived Representations without Phenotype Engineering

Sooknah, M.; Srinivasan, R.; Sankarapandian, S.; Chen, Z.; Xu, J.

2026-04-17 genetic and genomic medicine 10.64898/2026.04.10.26350624 medRxiv
Top 5%
0.6%
Show abstract

Genome-wide association studies (GWAS) have transformed our understanding of human biology, but are constrained by the need for predefined phenotypes. We introduce Vector2Variant (V2V), a general-purpose framework that transforms any set of high-dimensional measurements (such as machine learning embeddings) into a genome-wide scan for associations, without requiring rigid specification of a phenotype. Rather than testing genetic variants against single traits, V2V finds the axis in multivariate space along which carriers and non-carriers maximally differ, and produces a continuous "projection phenotype" that can be interpreted by association with disease labels. The projection phenotypes correlate with orthogonal clinical biomarkers never seen during training, suggesting the learned axes capture biologically meaningful variation. We applied V2V to imaging, timeseries, and omics modalities in the UK Biobank and recovered established biology (like the role of CASP9 in renal failure) without the need for targeted measurements, alongside novel associations including a frameshift variant in LRRIQ1 (potentially protective for cardiovascular disease). V2V is computationally efficient at genome-wide scale, producing summary statistics and disease associations that facilitate target prioritization without the need for phenotype engineering.

6
Why Invariant Risk Minimization Fails on TabularData: A Gradient Variance Solution

Mboya, G. O.

2026-04-13 epidemiology 10.64898/2026.04.09.26350513 medRxiv
Top 6%
0.5%
Show abstract

Machine learning models trained on observational data from one environment frequently fail when deployed in another, because standard learning algorithms exploit spurious correlations alongside causal ones. Invariant learning methods address this problem by seeking representations that support stable prediction across training environments, but their behavior on tabular data remains poorly characterized. We present CausTab, a gradient variance regularization framework for causal invariant representation learning on mixed tabular data. CausTab penalizes the variance of parameter gradients across training environments, providing a richer invariance signal than the scalar penalty used by Invariant Risk Minimization (IRM). We provide formal results showing that the gradient variance penalty is zero at causally invariant solutions and positive at solutions that rely on spurious features. Through experiments on synthetic data across three spurious-correlation regimes, four cycles of the National Health and Nutrition Examination Survey (NHANES), and four hospital systems in the UCI Heart Disease dataset, we demonstrate that: (1) IRM consistently degrades relative to standard empirical risk minimization (ERM) on tabular data, losing up to 13.8 AUC points in spurious-dominant settings, a failure we trace mechanistically to penalty collapse during training; (2) CausTab matches or exceeds ERM in every experimental condition; (3) CausTab achieves consistently better probability calibration than both ERM and IRM; and (4) invariant learning methods fail when environments differ in outcome prevalence rather than in spurious feature correlations, a boundary condition we characterize both empirically and theoretically. We introduce the Spurious Dominance Index (SDI), a practical scalar diagnostic for determining whether a dataset requires invariant learning, and validate it across all experimental settings

7
Single-molecule cfDNA sequencing establishes clinical utility for ecDNA monitoring and multimodal liquid biopsy analysis

Sauer, C. M.; Tovey, N.; Ptasinska, A.; Hughes, D.; Stockton, J.; Zumalave, S.; Rust, A. G.; Lynn, C.; Livellara, V.; Sevrin, F.; Himsworth, C.; Muyas, F.; Nicolaidou, M.; Parry, G.; Paisana, E.; Cascao, R.; Ahmed, S. W.; Yasin, S. A.; Portela, L. R.; Balasubramanian, P.; Burke, G. A. A.; Vedi, A.; Faria, C. C.; Marshall, L. V.; Jacques, T. S.; Hubank, M.; Hargrave, D.; George, S.; Angelini, P.; Anderson, J.; Chesler, L.; Beggs, A. D.; Cortes-Ciriano, I.

2026-04-12 oncology 10.64898/2026.04.08.26350410 medRxiv
Top 6%
0.5%
Show abstract

Cell-free DNA (cfDNA) profiling enables minimally invasive cancer detection and monitoring. We present SIMMA, a low-input single-molecule sequencing approach that enables multimodal whole-genome and high-depth targeted sequencing of the same cfDNA sample for both tumour-agnostic and tumour-informed liquid biopsy analysis. Across 792 plasma and cerebrospinal fluid cfDNA samples from 277 paediatric patients with diverse brain and extracranial tumours, SIMMA enabled tumour diagnosis, detection of driver mutations, and reconstruction of extrachromosomal DNA (ecDNA) months before clinical relapse. Using conformal prediction trained on genome-wide fragmentomics, genomic and epigenomic data, SIMMA predicts disease burden as a continuous variable and provides well-calibrated uncertainty estimates for each sample, achieving a limit of detection of [~]100 ppm from low-pass whole-genome sequencing data. In summary, SIMMA establishes the clinical utility of multimodal cfDNA profiling with uncertainty quantification for individual patients and unlocks the potential of ecDNA as a liquid biopsy biomarker for disease detection and monitoring across diverse aggressive malignancies.

8
Maternal health and autism risk: parsing direct and indirect genetic effects using 3-generation family linkage

Arildskov, E. S.; Khachadourian, V.; Grove, J.; Schendel, D.; Hansen, S. N.; Janecka, M.

2026-04-17 psychiatry and clinical psychology 10.64898/2026.04.15.26350976 medRxiv
Top 6%
0.5%
Show abstract

Despite autism's prominent genetic etiology and early-life origins, parsing genetic effects contributing to the condition into those that operate directly (via allelic transmission to offspring) vs. indirectly (via influencing prenatal environment) remains challenging. We examined this using a novel design leveraging 3-generation family linkage in Danish national registers. The cohort included all children born in Denmark from 1998-2015 and their relatives identified through 3-generation family linkage. The analytic sample comprised full maternal cousin pairs, including parallel (children of mother's sister) and cross cousins (children of mother's brother). Exposures were diagnoses in the index mother previously associated with offspring autism; the outcome was autism diagnosis in cousins of the index child. We used Cox proportional hazards models to estimate associations separately in parallel and cross cousins, followed by comparisons of these hazard ratios to infer mechanisms. Several maternal diagnoses (e.g., postpartum hemorrhage, personality disorders, epilepsy) were associated with autism in both parallel and cross cousins, consistent with shared direct genetic effects. Other conditions (e.g., false labor, recurrent major depressive disorder, other anxiety disorders, systemic connective tissue involvement) showed stronger associations in parallel than cross cousins, supporting additional indirect genetic effects operating through the prenatal environment. Adjustment for the same diagnosis in the cousin's own mother did not substantially change estimates, providing no evidence for an additional role of non-genetic mechanisms associated with the diagnosis. These findings suggest that both direct and indirect genetic effects contribute to observed links between maternal health and offspring autism, highlighting etiologic heterogeneity and highlighting a registry-based family design to separate these pathways without genetic data.

9
Genetic confounding in the associations between maternal health and autism

Arildskov, E. S.; Ahlqvist, V. H.; Khachadourian, V.; Asgel, Z.; Schendel, D.; Hansen, S. N.; Grove, J.; Janecka, M.

2026-04-17 epidemiology 10.64898/2026.04.16.26351033 medRxiv
Top 6%
0.5%
Show abstract

The etiology of autism is influenced by genetic and non-genetic factors, with observational studies suggesting associations between early maternal health diagnoses and offspring autism. However, these associations may partly reflect shared familial genetic liability rather than direct causal effects. Using comprehensive national health registers and individual-level genetic data from the iPSYCH cohort (N=117,542), we examined whether maternal health diagnoses are associated with offspring polygenic scores (PGS) for autism. Such associations between maternal health and offspring autism would indicate shared genetic factors and the possibility of genetic confounding in the observational associations. We also tested such associations with PGSs for other neuropsychiatric and neurodevelopmental conditions that are genetically correlated with autism, but with better-powered PGS (due to larger GWAS sample sizes and likely more polygenic genetic architecture), as well as height, a negative control. Several maternal diagnoses were nominally associated with autism PGS in the child, including, e.g., certain obstetric complications, asthma, and obesity. After adjustment for multiple testing, the only statistically significant results included those between maternal diagnoses, predominantly psychiatric, and other neuropsychiatric and neurodevelopmental PGSs in the child. Sensitivity analyses confirmed the robustness of our results across exposure windows, diagnostic settings, and socioeconomic adjustments. These findings indicate that maternal diagnoses associated with autism partially reflect shared genetic liabilities between mothers and their children. However, such genetic effects, as captured by child PGS do not fully explain the observed associations, suggesting additional factors, including e.g., non-genetic familial factors, rare variants, and indirect effects.

10
Culture-independent identification and serotyping of Streptococcus pneumoniae by targeted metagenomics in pleural fluid samples

Smith, S. A. M.; Rockett, R. J.; Oftadeh, S.; Tam, K. K.-G.; Payne, M.; Golubchik, T.; Sintchenko, V.

2026-04-16 epidemiology 10.64898/2026.04.13.26350812 medRxiv
Top 7%
0.4%
Show abstract

Streptococcus pneumoniae is the leading cause of empyema and pneumonia in children, and monitoring of effectiveness of polyvalent pneumococcal vaccines has been essential for controlling invasive pneumococcal disease (IPD) in children and elderly adults. Conventional serotyping of pneumococci has relied on Quellung reaction following laboratory culture, however more recently whole genome sequencing (WGS) has been implemented in many reference laboratories to enhance traditional typing. Pleural fluid samples from cases with empyema are often culture negative, limiting the utility of WGS and requiring polymerase chain reaction (PCR) or 16S rRNA sequencing to detect S. pneumoniae. These molecular methods have limited sensitivity and capacity to characterise pneumococcus in clinical samples, especially in specimens with a low pathogen abundance. This study applied capture-based enrichment (tNGS) to identify and characterise S. pneumoniae directly from pleural fluid samples. A total of 51 pleural fluid samples were subjected to tNGS with a custom probe panel, for 39 known positive fluids collected from IPD cases between 2018-2025 in New South Wales, Australia. tNGS results were benchmarked against molecular-based serotyping. Our tNGS achieved 100% sensitivity and specificity in detecting S. pneumoniae. Serotyping results were concordant with PCR and 95% (37/39) of S. pneumoniae PCR positive pleural fluid cases could be serotyped using tNGS. Standard molecular methods however could only determine serotype in 56% (22/39) of samples. This tNGS enabled 39% improvement in ability to directly identify and serotype IPD-associated serotypes of S. pneumoniae in difficult-to-culture pleural fluids can significantly enhance laboratory surveillance of IPD as well as our understanding of vaccine effectiveness.

11
Distinct Metabolic Signatures Distinguish Lung, Colorectal and Ovarian Cancer

Tsiara, I.; Vouzaxaki, E.; Ekström, J.; Rameika, N.; Yang, F.; Jain, A.; Iglesias Alonso, A.; Sjöblom, T.; Globisch, D.

2026-04-13 oncology 10.64898/2026.04.08.26350309 medRxiv
Top 11%
0.1%
Show abstract

Cancer-related casualties are the most common cause of death worldwide. The discovery of biomarkers is of utmost importance for diagnosis and disease monitoring. Herein, we performed a comprehensive metabolomics biomarker discovery effort in plasma from 615 lung, ovarian and colorectal cancer patients at diagnosis and 95 non-cancerous control subjects. This pan-cancer investigation identified specific panels of metabolites in the entire sample cohort with a high discriminating power and demonstrated by combined ROC AUC values of up to 0.95. The identified metabolites are mainly associated with lipid and amino acid metabolism as well as xenobiotic transformation. These metabolite panels of high predictive power provide new metabolic insights in these cancers and demonstrate the potential of metabolomics for improved diagnosis and monitoring disease progression.

12
Functional PD-1/PD-L1 engagement defines a spatial biomarker of immunotherapy response

Ullman, T.; Krantz, D.; Avenel, C.; Lung, M.; Svedman, F. C.; Holmsten, K.; Ostling, P.; Ullen, A.; Stadler, C.

2026-04-17 oncology 10.64898/2026.04.15.26350929 medRxiv
Top 11%
0.1%
Show abstract

Effective predictive biomarkers for immune checkpoint inhibitor (ICI) therapy remain an unmet need across solid tumors. Here, we present an integrated spatial proteomics workflow that combines in situ proximity ligation assay with multiplexed immunofluorescence to directly resolve PD1/PDL1 signaling events at the level of defined cellular phenotypes and their spatial organization within intact tumor tissue. Applied as a proof of concept to tumor samples from patients with metastatic urothelial carcinoma treated with pembrolizumab, this approach reveals that PD1/PDL1 interactions specifically involving cytotoxic CD8CD3 T cells are significantly enriched in complete responders, while such interactions are rare in patients with progressive disease. This interaction defined T cell subset achieves superior discrimination of clinical response compared to single marker PDL1 expression or immune cell abundance alone. By integrating direct detection of protein protein interactions with high dimensional single cell phenotyping, our workflow provides a mechanistically informed, spatially resolved biomarker of functional immune engagement. Beyond urothelial carcinoma, this platform establishes a generalizable framework for translating spatial signaling biology into predictive tools for immunotherapy response across tumor types.

13
Fine-Tuning PubMedBERT for Hierarchical Condition Category Classification

Wang, X.; Hammarlund, N.; Prosperi, M.; Zhu, Y.; Revere, L.

2026-04-15 health systems and quality improvement 10.64898/2026.04.13.26350814 medRxiv
Top 12%
0.1%
Show abstract

Automating Hierarchical Condition Category (HCC) assignment directly from unstructured electronic health record (EHR) notes remains an important but understudied problem in clinical informatics. We present HCC-Coder, an end to end NLP system that maps narrative documentation to 115 Centers for Medicare & Medicaid Services(CMS) HCC codes in a multi-label setting. On the test dataset, HCC-Coder achieves a macro-F1 of 0.779 and a micro-F1 of 0.756, with a macro-sensitivity of 0.819 and macro-specificity of 0.998. By contrast, Generative Pre-trained Transformer (GPT)-4o achieves highest score of a macro-F1 of 0.735 and a micro-F1 of 0.708 under five-shot prompting. The fine-tuned model demonstrates consistent absolute improvements of 4%-5% in F1-scores over GPT-4o. To address severe label imbalance, we incorporate inverse-frequency weighting and per-label threshold calibration. These findings suggest that domain-adapted transformers provide more balanced and reliable performance than prompt-based large language models for hierarchical clinical coding and risk adjustment.

14
Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations

Muller, B.; Ortiz Barranon, A. A.; Roberts, L.

2026-04-17 neurology 10.64898/2026.04.12.26350731 medRxiv
Top 12%
0.1%
Show abstract

Dysarthric speech severity assessment typically requires either trained clinicians or supervised machine learning models built from labelled pathological speech data, limiting scalability across languages and clinical settings. We present a training-free method (no supervised severity model is trained; feature directions are estimated from healthy control speech using a pretrained forced aligner) that quantifies dysarthria severity by measuring the degradation of phonological feature subspaces within frozen HuBERT representations. For each speaker, we extract phone-level embeddings via Montreal Forced Aligner, compute d scores along phonological contrast directions (nasality, voicing, stridency, sonorance, manner, and four vowel features) derived exclusively from healthy control speech, and construct a 12-dimensional phonological profile. Evaluating 890 speakers across10corpora, 5 languages for the full MFA pipeline (English, Spanish, Dutch, Mandarin, French) and 3 primary aetiologies (Parkinsons disease, cerebral palsy, amyotrophic lateral sclerosis), we find that all five consonant d features correlate significantly with clinical severity (random-effects meta-analysis rho = -0.50 to -0.56, p < 2 x 10^-4; pooled Spearman rho = -0.47 to -0.55 with bootstrap 95% CIs not crossing zero), with the effect replicating within individual corpora, surviving FDR correction, and remaining robust to leave-one-corpus-out removal and alignment quality controls. Nasality d decreases monotonically from control to severe in 6 of 7 severity-graded corpora. Mann-Whitney U tests confirm that all 12 features distinguish controls from severely dysarthric speakers (p < 0.001).The method requires no dysarthric training data and applies to any language with an existing MFA acoustic model (currently 29 languages) or a model trained from healthy speech alone. It produces clinically interpretable per-feature profiles. We release the full pipeline and phone feature configurations for six languages to support replication and clinical adoption. Author SummaryOne of the authors has lived with ALS for sixteen years. Bernard Muller, who built this entire analytical pipeline using only eye-tracking technology, has experienced the progression of the disease firsthand, including the dysarthric speech that comes with advancing ALS and the tracheostomy that followed. The problem this paper addresses is not abstract to him, and that shapes how the method was designed. We developed a method to measure how well a person with dysarthria can produce distinct speech sounds, without needing any recordings of disordered speech for training. Our approach works by analysing how a widely available AI speech model organises different sound categories -- such as nasal versus oral consonants, or voiced versus voiceless sounds -- and measuring whether those categories become harder to tell apart. We tested this on 890 speakers across 10 datasets in five languages, covering Parkinsons disease, cerebral palsy, and ALS. Because the method only needs healthy speech recordings to set up, it applies to any language with an existing acoustic model, currently covering 29 languages. The resulting profiles show clinicians which specific aspects of speech production are degrading, rather than providing a single opaque severity score. This could support remote monitoring of speech decline in neurodegenerative disease and enable screening in languages and settings where specialist assessment is unavailable.

15
Heterogeneous, Population-Level Drug-Tolerant Persisters Exhibit Ion-Channel Remodeling and Ferroptosis Susceptibility

Hayford, C. E.; Baleami, B.; Stauffer, P. E.; Paudel, B. B.; Al'Khafaji, A.; Brock, A.; Quaranta, V.; Tyson, D. R.; Harris, L. A.

2026-04-13 systems biology 10.1101/2022.02.03.479045 medRxiv
Top 13%
0.1%
Show abstract

Drug-tolerant persisters (DTPs) represent a major obstacle to durable responses in targeted cancer therapy. DTPs are commonly described as distinct single-cell states that survive drug treatment via reversible, non-genetic mechanisms and drive tumor recurrence. Recent work demonstrates that multiple DTPs can coexist, reflecting diversity in lineage, signaling programs, or stress responses. However, each DTP is still generally viewed as a uniform cellular phenotype. Building on our prior work describing a population-level DTP termed "idling" [Paudel et al., Biophys. J. (2018) 114, 1499-1511], here we present evidence supporting a fundamentally different view: that DTPs are not single-cell states, but rather heterogeneous populations composed of multiple sub-states with distinct division and death rates that balance to produce near-zero net population growth. Using single-cell transcriptomics and lineage barcoding, we identify multiple phenotypic states within idling DTP populations, with reduced heterogeneity compared to untreated populations, and find that idling DTP cells emerge from nearly all lineages. Transcriptomic and functional analyses further reveal altered ion-channel activity in idling DTPs, which we confirm experimentally. Moreover, drug-response assays reveal increased susceptibility of idling DTPs to ferroptosis, a non-apoptotic form of regulated cell death, indicating the emergence of vulnerabilities associated with drug tolerance. Altogether, our results support a population-level view of tumor drug tolerance in which DTPs comprise stable collections of phenotypic states, shaped by treatment-defined phenotypic landscapes, which are potentially vulnerable to subsequent interventions. This perspective implies that eradicating DTPs will require a fundamental shift away from cell-type-centric strategies toward sequential treatments that progressively reduce phenotypic heterogeneity by modulating the molecular and cellular processes that establish the DTP landscape, an approach previously termed "targeted landscaping."

16
Molecular signature of pediatric B-ALL determines outcomes post CD19 CAR-T cell therapy

Oszer, A.; Pastorczak, A.; Urbanska, Z.; Miarka, K.; Marschollek, P.; Richert-Przygonska, M.; Mielcarek-Siedziuk, M.; Baggott, C.; Schultz, L.; Moon, J.; Aftandilian, C.; Styczynski, J.; Kalwak, K.; Mlynarski, W.; Davis, K. L.

2026-04-13 oncology 10.64898/2026.04.11.26350681 medRxiv
Top 13%
0.1%
Show abstract

Chimeric antigen receptor T-cell (CAR-T) therapy targeting CD19 has transformed outcomes for children with relapsed or refractory (R/R) B-cell acute lymphoblastic leukemia (B-ALL), yet the influence of molecular subtype on outcomes remains unclear. We evaluated the impact of cytogenetic and molecular signatures on complete response (CR), overall survival (OS), and leukemia-free survival (LFS) after CD19 CAR-T therapy in eighty-six pediatric patients with R/R B-ALL treated with tisagenlecleucel. CR was assessed 30 days after infusion. Cytogenetic data were available for 84 patients and molecular profiling for 62. Survival analyses included 72 patients who received CD19 CAR-T as the sole cellular therapy. Seventy-seven patients achieved CR (89.5%). Pre-infusion bone marrow blasts of [&ge;]20% were associated with lower CR rates (53.8% vs 95.9%, p<0.0001) and significantly reduced OS and LFS (both p<0.0001). Among molecular markers, RAS mutations correlated with inferior OS (p=0.0222) and LFS (0.0402). In multivariate analysis, bone marrow blasts >20% and RAS mutations independently predicted inferior OS. Post CAR-T, CD19 negative relapses showed almost twice higher prevalence of RAS mutations (66% vs 37.5%). These findings highlight RAS mutations as a key molecular predictor of outcome after CD19 CAR-T therapy and suggest emergence of unique risk stratification for patients receiving CD19-targeting therapy.

17
Shared inheritance reveals landscape of somatic and germline cancer risk in TP53

MacGregor, H. A. J.; Blundell, J. R.; Easton, D. F.

2026-04-11 genetic and genomic medicine 10.64898/2026.04.10.26350605 medRxiv
Top 13%
0.1%
Show abstract

Pathogenic variants in TP53, the key tumour-suppressor gene underlying Li-Fraumeni syndrome (LFS), are among the best-established causes of inherited cancer predisposition. However, large-scale sequencing has revealed that many apparently pathogenic TP53 variants detected in blood are the result of somatic clonal expansions, complicating risk interpretation. Using blood-derived whole-exome data from 469,391 UK Biobank participants, we combined variant allele fraction (VAF) with haplotype-sharing analysis to distinguish germline and somatic TP53 variants. Germline variants were concentrated at sites linked to partial loss of p53 function and lower disease penetrance, whereas classic LFS alleles appeared almost entirely somatic. High-VAF carriers of classic LFS alleles conferred markedly increased risk of haematological malignancy but not solid tumours, consistent with large TP53-mutant clonal expansions. The prevalence of somatic clonal expansion also correlated with missense variant pathogenicity, suggesting that somatic activity provides an informative in vivo proxy for functional impact. These results provide new insights into TP53-associated cancer risk at the population level, demonstrate that somatic rather than germline risk predominates in middle-aged healthy adults and provide a scalable framework for variant classification in large-scale population genomics.

18
Drug response profiling guides precision therapy in relapsed and refractory childhood acute lymphoblastic leukemia

Steffen, F. D.; Lissat, A.; Alten, J.; Kriston, A.; Scheidegger, N.; Eckert, C.; Bodmer, N.; Schori, L.; Schühle, S.; Arpagaus, A.; Gutnik, S.; Manioti, D.; Bruderer, N.; Zeckanovic, A.; Västrik, I.; Nyiri, G.; Kovacs, F.; Thorhauge Als-Nielsen, B. E.; Attarbaschi, A.; Rademacher, A.; Elitzur, S.; Jacoby, E.; De Moerloose, B.; Svenberg, P.; Ancliff, P.; Sramkova, L.; Buldini, B.; Balduzzi, A.; Boer, J. M.; Mielcarek, M.; Ceppi, F.; Ansari, M.; Halter, J.; Schmiegelow, K.; Locatelli, F.; DelBufalo, F.; Stanulla, M.; Kulozik, A. E.; Schrappe, M.; Rohrlich, P.; Cave, H.; Baruchel, A.; von Stack

2026-04-11 oncology 10.64898/2026.04.08.26350164 medRxiv
Top 14%
0.1%
Show abstract

Children with relapsed or refractory acute lymphoblastic leukemia (ALL) require more effective and less toxic therapies. We established a prospective, multicenter Drug Response Profiling (DRP) registry (NCT06550102) integrating functional testing into precision-guided treatment. DRP was performed for 340 patients from 17 European countries with a turn-around time of two-weeks. Image-based drug screening with over 135000 unique perturbations revealed a heterogeneous landscape of ex vivo responses to 88 drugs on average. Ranking drug responses across the patient cohort defined individual drug fingerprints, identifying "DRP twins" by similarity in sensitivity and resistance independent of genetic ALL subtypes. Of 239 high-risk patients with follow-up, DRP-informed interventions were reported for 63 patients (26%). Patients received combination therapies based on venetoclax, tyrosine kinase inhibitors, trametinib, bortezomib or selinexor, resulting in objective clinical responses in 43 cases (68%). Precision-guided treatments allowed bridging to cellular therapies in 42 patients among whom 28 (67%) were still alive with a median follow-up of 21 months after DRP (IQR: 14.7-26.6 months). Top responders to venetoclax, ranked within the first tertile of the cohort, had superior 1-year event-survival compared to venetoclax non-responders (0.57 [95% CI, 0.39-0.85] vs. 0.25 [95% CI, 0.11-0.58]). Collectively, these findings demonstrate the feasibility and clinical relevance of functional profiling within an international network. This scalable framework enables individualized therapy selection for enrolment in adaptive precision trials for high-risk pediatric ALL.

19
Patient-Centred Communication in Lung Cancer Screening: A Clinically Focussed Evaluation of a Fine-Tuned Open-Source Model Against a Larger Frontier System

Khanna, S.; Chaudhary, R.; Narula, N.; Lee, R.

2026-04-11 oncology 10.64898/2026.04.10.26350595 medRxiv
Top 15%
0.0%
Show abstract

Lung cancer screening saves lives, yet uptake remains suboptimal and inequitable. Personalised communication can improve attendance and reduce anxiety, but scaling such support is a workforce challenge. We fine-tuned Googles Gemma 2 9B using QLoRA on 5,086 synthetic screening conversations and compared it against Googles Gemini 2.5 Flash (a larger frontier model) and an unmodified baseline across 300 multi-turn conversations with 100 patient personas spanning ten clinical categories. Evaluation combined automated natural language processing metrics with independent language model judgement in two complementary modes: structured clinical rubric and simulated patient persona. The fine-tuned model achieved the highest simulated patient experience score (3.71/5 vs 3.65 for the frontier model), recorded zero boundary violations after clinician review of all flagged instances, and led on the four most safety-critical categories. A composite Patient Adaptation Index showed that the fine-tuned model led overall (0.37 vs 0.35 vs 0.35), with its clearest advantage on the two clinically specific components: empathy calibration to patient distress and selective smoking cessation signposting. These findings suggest that targeted fine-tuning of open-source models can yield clinical communication quality comparable to larger proprietary systems, with advantages in safety-critical scenarios and suitability for NHS data governance constraints. Human clinician review of these conversations is ongoing.

20
Leveraging State-of-the-Art LLMs for the De-identification of Sensitive Health Information in Clinical Speech

Dai, H.-J.; Mir, T. H.; Fang, L.-C.; Chen, C.-T.; Feng, H.-H.; Lai, J.-R.; Hsu, H.-C.; Nandy, P.; Panchal, O.; Liao, W.-H.; Tien, Y.-Z.; Chen, P.-Z.; Lin, Y.-R.; Jonnagaddala, J.

2026-04-17 health informatics 10.64898/2026.04.13.26349911 medRxiv
Top 15%
0.0%
Show abstract

Accurate recognition and deidentification of sensitive health information (SHI) in spoken dialogues requires multimodal algorithms that can understand medical language and contextual nuance. However, the recognition and deidentification risks expose sensitive health information (SHI). Additionally, the variability and complexity of medical terminology, along with the inherent biases in medical datasets, further complicate this task. This study introduces the SREDH/AI-Cup 2025 Medical Speech Sensitive Information Recognition Challenge, which focuses on two tasks: Task-1: Speech transcription systems must accurately transcribe speech into text; and Task-2: Medical speech de-identification to detect and appropriately classify mentions of SHI. The competition attracted 246 teams; top-performing systems achieved a mixed error rate (MER) of 0.1147 and a macro F1-score of 0.7103, with average MER and macro F1-score of 0.3539 and 0.2696, respectively. Results were presented at the IW-DMRN workshop in 2025. Notably, the results reveal that LLMs were prevalent across both tasks: 97.5% of teams adopted LLMs for Task 1 and 100% for Task 2. Highlighting their growing role in healthcare. Furthermore, we finetuned six models, demonstrating strong precision ([~]0.885-0.889) with slightly lower recall ([~]0.830-0.847), resulting in F1-scores of 0.857-0.867.